← Início

Overview

Brought to you by YData

Dataset statistics

Number of variables34
Number of observations19372
Missing cells371611
Missing cells (%)56.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory25.8 MiB
Average record size in memory1.4 KiB

Variable types

Text7
Numeric3
Categorical16
Boolean6
Unsupported2

Alerts

WHOLE_GENOME_SCREEN has constant value "False" Constant
REARRANGEMENT_SCREEN has constant value "False" Constant
MSI has constant value "Unknown" Constant
CYTOGENETICS is highly overall correlated with GENDER and 5 other fieldsHigh correlation
ENVIRONMENTAL_VARIABLES is highly overall correlated with INDIVIDUAL_ID and 5 other fieldsHigh correlation
ETHNICITY is highly overall correlated with INDIVIDUAL_ID and 9 other fieldsHigh correlation
FAMILY is highly overall correlated with GERMLINE_MUTATION and 9 other fieldsHigh correlation
GENDER is highly overall correlated with CYTOGENETICS and 3 other fieldsHigh correlation
GERMLINE_MUTATION is highly overall correlated with FAMILY and 7 other fieldsHigh correlation
GRADE is highly overall correlated with FAMILY and 9 other fieldsHigh correlation
INDIVIDUAL_ID is highly overall correlated with CYTOGENETICS and 12 other fieldsHigh correlation
INDIVIDUAL_REMARK is highly overall correlated with GENDER and 11 other fieldsHigh correlation
METASTATIC_SITE is highly overall correlated with CYTOGENETICS and 6 other fieldsHigh correlation
MUTATION_ALLELE_SPECIFICATION is highly overall correlated with ETHNICITY and 9 other fieldsHigh correlation
NORMAL_TISSUE_TESTED is highly overall correlated with ETHNICITY and 7 other fieldsHigh correlation
RNASEQ_SCREEN is highly overall correlated with CYTOGENETICS and 11 other fieldsHigh correlation
SAMPLE_TYPE is highly overall correlated with CYTOGENETICS and 5 other fieldsHigh correlation
STAGE is highly overall correlated with GENDER and 8 other fieldsHigh correlation
TARGETED_SCREEN is highly overall correlated with ENVIRONMENTAL_VARIABLES and 10 other fieldsHigh correlation
THERAPY is highly overall correlated with ETHNICITY and 9 other fieldsHigh correlation
TUMOUR_ID is highly overall correlated with CYTOGENETICS and 12 other fieldsHigh correlation
TUMOUR_REMARK is highly overall correlated with FAMILY and 10 other fieldsHigh correlation
TUMOUR_SOURCE is highly overall correlated with ENVIRONMENTAL_VARIABLES and 8 other fieldsHigh correlation
WHOLE_EXOME_SCREEN is highly overall correlated with ENVIRONMENTAL_VARIABLES and 10 other fieldsHigh correlation
SAMPLE_TYPE is highly imbalanced (53.6%) Imbalance
WHOLE_EXOME_SCREEN is highly imbalanced (97.1%) Imbalance
TARGETED_SCREEN is highly imbalanced (96.8%) Imbalance
RNASEQ_SCREEN is highly imbalanced (99.5%) Imbalance
GRADE is highly imbalanced (89.4%) Imbalance
STAGE is highly imbalanced (73.8%) Imbalance
THERAPY is highly imbalanced (74.8%) Imbalance
FAMILY is highly imbalanced (71.3%) Imbalance
NORMAL_TISSUE_TESTED has 17683 (91.3%) missing values Missing
AGE has 15740 (81.3%) missing values Missing
THERAPY_RELATIONSHIP has 16397 (84.6%) missing values Missing
SAMPLE_DIFFERENTIATOR has 19303 (99.6%) missing values Missing
MUTATION_ALLELE_SPECIFICATION has 19266 (99.5%) missing values Missing
AVERAGE_PLOIDY has 19372 (100.0%) missing values Missing
SAMPLE_REMARK has 19372 (100.0%) missing values Missing
DRUG_RESPONSE has 17537 (90.5%) missing values Missing
GRADE has 19231 (99.3%) missing values Missing
AGE_AT_TUMOUR_RECURRENCE has 19335 (99.8%) missing values Missing
STAGE has 19097 (98.6%) missing values Missing
CYTOGENETICS has 19330 (99.8%) missing values Missing
METASTATIC_SITE has 18912 (97.6%) missing values Missing
TUMOUR_REMARK has 18428 (95.1%) missing values Missing
ETHNICITY has 17527 (90.5%) missing values Missing
ENVIRONMENTAL_VARIABLES has 19284 (99.5%) missing values Missing
GERMLINE_MUTATION has 19356 (99.9%) missing values Missing
THERAPY has 18066 (93.3%) missing values Missing
FAMILY has 19240 (99.3%) missing values Missing
INDIVIDUAL_REMARK has 19135 (98.8%) missing values Missing
COSMIC_SAMPLE_ID has unique values Unique
AVERAGE_PLOIDY is an unsupported type, check if it needs cleaning or further analysis Unsupported
SAMPLE_REMARK is an unsupported type, check if it needs cleaning or further analysis Unsupported

Reproduction

Analysis started2025-07-15 00:50:38.700461
Analysis finished2025-07-15 00:50:43.836070
Duration5.14 seconds
Software versionydata-profiling vv4.16.1
Download configurationconfig.json

Variables

COSMIC_SAMPLE_ID
Text

Unique 

Distinct19372
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size1.4 MiB
2025-07-15T00:50:44.032415image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

Max length11
Median length11
Mean length10.84617
Min length10

Characters and Unicode

Total characters210112
Distinct characters13
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique19372 ?
Unique (%)100.0%

Sample

1st rowCOSS1496935
2nd rowCOSS1036427
3rd rowCOSS1496985
4th rowCOSS2468028
5th rowCOSS820950
ValueCountFrequency (%)
coss1930149 1
 
< 0.1%
coss1182421 1
 
< 0.1%
coss1496935 1
 
< 0.1%
coss1036427 1
 
< 0.1%
coss1496985 1
 
< 0.1%
coss2468028 1
 
< 0.1%
coss820950 1
 
< 0.1%
coss1731941 1
 
< 0.1%
coss1384396 1
 
< 0.1%
coss1544835 1
 
< 0.1%
Other values (19362) 19362
99.9%
2025-07-15T00:50:44.336722image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
S 38744
18.4%
1 22767
10.8%
2 20382
9.7%
O 19372
9.2%
C 19372
9.2%
9 14989
 
7.1%
0 13345
 
6.4%
3 11364
 
5.4%
8 10954
 
5.2%
4 10309
 
4.9%
Other values (3) 28514
13.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 210112
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
S 38744
18.4%
1 22767
10.8%
2 20382
9.7%
O 19372
9.2%
C 19372
9.2%
9 14989
 
7.1%
0 13345
 
6.4%
3 11364
 
5.4%
8 10954
 
5.2%
4 10309
 
4.9%
Other values (3) 28514
13.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 210112
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
S 38744
18.4%
1 22767
10.8%
2 20382
9.7%
O 19372
9.2%
C 19372
9.2%
9 14989
 
7.1%
0 13345
 
6.4%
3 11364
 
5.4%
8 10954
 
5.2%
4 10309
 
4.9%
Other values (3) 28514
13.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 210112
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
S 38744
18.4%
1 22767
10.8%
2 20382
9.7%
O 19372
9.2%
C 19372
9.2%
9 14989
 
7.1%
0 13345
 
6.4%
3 11364
 
5.4%
8 10954
 
5.2%
4 10309
 
4.9%
Other values (3) 28514
13.6%
Distinct19269
Distinct (%)99.5%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
2025-07-15T00:50:44.677604image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

Max length17
Median length7
Mean length6.8540161
Min length1

Characters and Unicode

Total characters132776
Distinct characters40
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique19221 ?
Unique (%)99.2%

Sample

1st row1496935
2nd row1036427
3rd row1496985
4th row2468028
5th rowE17588
ValueCountFrequency (%)
10 6
 
< 0.1%
8 5
 
< 0.1%
9 5
 
< 0.1%
5 5
 
< 0.1%
6 5
 
< 0.1%
1 5
 
< 0.1%
7 5
 
< 0.1%
4 5
 
< 0.1%
11 4
 
< 0.1%
2 4
 
< 0.1%
Other values (19257) 19323
99.7%
2025-07-15T00:50:45.115314image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 22613
17.0%
2 19059
14.4%
9 14888
11.2%
0 13317
10.0%
3 11998
9.0%
4 10208
7.7%
8 9627
7.3%
6 9472
7.1%
5 9146
6.9%
7 8939
 
6.7%
Other values (30) 3509
 
2.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 132776
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1 22613
17.0%
2 19059
14.4%
9 14888
11.2%
0 13317
10.0%
3 11998
9.0%
4 10208
7.7%
8 9627
7.3%
6 9472
7.1%
5 9146
6.9%
7 8939
 
6.7%
Other values (30) 3509
 
2.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 132776
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1 22613
17.0%
2 19059
14.4%
9 14888
11.2%
0 13317
10.0%
3 11998
9.0%
4 10208
7.7%
8 9627
7.3%
6 9472
7.1%
5 9146
6.9%
7 8939
 
6.7%
Other values (30) 3509
 
2.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 132776
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1 22613
17.0%
2 19059
14.4%
9 14888
11.2%
0 13317
10.0%
3 11998
9.0%
4 10208
7.7%
8 9627
7.3%
6 9472
7.1%
5 9146
6.9%
7 8939
 
6.7%
Other values (30) 3509
 
2.6%
Distinct94
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size1.4 MiB
2025-07-15T00:50:45.260959image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

Max length13
Median length12
Mean length12.002426
Min length12

Characters and Unicode

Total characters232511
Distinct characters13
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique20 ?
Unique (%)0.1%

Sample

1st rowCOSO31355381
2nd rowCOSO28815381
3rd rowCOSO31355381
4th rowCOSO36075381
5th rowCOSO28775381
ValueCountFrequency (%)
coso36605381 7704
39.8%
coso31355381 5178
26.7%
coso36075381 2672
 
13.8%
coso28815381 920
 
4.7%
coso36075385 553
 
2.9%
coso28815385 479
 
2.5%
coso28775381 367
 
1.9%
coso36075546 251
 
1.3%
coso36075763 209
 
1.1%
coso37735381 187
 
1.0%
Other values (84) 852
 
4.4%
2025-07-15T00:50:45.459733image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 41806
18.0%
O 38744
16.7%
5 26554
11.4%
1 24185
10.4%
8 22299
9.6%
6 20180
8.7%
C 19372
8.3%
S 19372
8.3%
0 11509
 
4.9%
7 5708
 
2.5%
Other values (3) 2782
 
1.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 232511
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 41806
18.0%
O 38744
16.7%
5 26554
11.4%
1 24185
10.4%
8 22299
9.6%
6 20180
8.7%
C 19372
8.3%
S 19372
8.3%
0 11509
 
4.9%
7 5708
 
2.5%
Other values (3) 2782
 
1.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 232511
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 41806
18.0%
O 38744
16.7%
5 26554
11.4%
1 24185
10.4%
8 22299
9.6%
6 20180
8.7%
C 19372
8.3%
S 19372
8.3%
0 11509
 
4.9%
7 5708
 
2.5%
Other values (3) 2782
 
1.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 232511
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 41806
18.0%
O 38744
16.7%
5 26554
11.4%
1 24185
10.4%
8 22299
9.6%
6 20180
8.7%
C 19372
8.3%
S 19372
8.3%
0 11509
 
4.9%
7 5708
 
2.5%
Other values (3) 2782
 
1.2%

TUMOUR_ID
Real number (ℝ)

High correlation 

Distinct19229
Distinct (%)99.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1703638.8
Minimum639922
Maximum2826849
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size302.7 KiB
2025-07-15T00:50:45.549835image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum639922
5-th percentile740420.55
Q11126645.8
median1815975.5
Q32294631.2
95-th percentile2760479.5
Maximum2826849
Range2186927
Interquartile range (IQR)1167985.5

Descriptive statistics

Standard deviation628388.95
Coefficient of variation (CV)0.36885105
Kurtosis-1.1446555
Mean1703638.8
Median Absolute Deviation (MAD)529103
Skewness0.0022162929
Sum3.3002891 × 1010
Variance3.9487268 × 1011
MonotonicityNot monotonic
2025-07-15T00:50:45.649967image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1064091 7
 
< 0.1%
1079647 7
 
< 0.1%
1079643 7
 
< 0.1%
1079619 7
 
< 0.1%
981099 6
 
< 0.1%
1079633 6
 
< 0.1%
1079637 6
 
< 0.1%
1079627 5
 
< 0.1%
1079644 5
 
< 0.1%
1079621 5
 
< 0.1%
Other values (19219) 19311
99.7%
ValueCountFrequency (%)
639922 1
< 0.1%
683145 1
< 0.1%
683146 1
< 0.1%
683147 1
< 0.1%
683148 1
< 0.1%
683149 1
< 0.1%
683150 1
< 0.1%
683151 1
< 0.1%
683152 1
< 0.1%
683153 1
< 0.1%
ValueCountFrequency (%)
2826849 1
< 0.1%
2826848 1
< 0.1%
2826847 1
< 0.1%
2811858 1
< 0.1%
2811857 1
< 0.1%
2808140 1
< 0.1%
2808139 1
< 0.1%
2808138 1
< 0.1%
2808137 1
< 0.1%
2808136 1
< 0.1%

SAMPLE_TYPE
Categorical

High correlation  Imbalance 

Distinct13
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.4 MiB
surgery-fixed
10572 
surgery - NOS
5680 
surgery fresh/frozen
1515 
fixed - NOS
 
801
NS
 
624
Other values (8)
 
180

Length

Max length20
Median length13
Mean length13.145003
Min length2

Characters and Unicode

Total characters254645
Distinct characters28
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowsurgery-fixed
2nd rowsurgery-fixed
3rd rowsurgery-fixed
4th rowsurgery-fixed
5th rowsurgery-fixed

Common Values

ValueCountFrequency (%)
surgery-fixed 10572
54.6%
surgery - NOS 5680
29.3%
surgery fresh/frozen 1515
 
7.8%
fixed - NOS 801
 
4.1%
NS 624
 
3.2%
fine needle aspirate 72
 
0.4%
fresh/frozen - NOS 45
 
0.2%
autopsy-fixed 26
 
0.1%
cell-line 23
 
0.1%
circulating tumour 5
 
< 0.1%
Other values (3) 9
 
< 0.1%

Length

2025-07-15T00:50:45.745642image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
surgery-fixed 10572
31.0%
surgery 7195
21.1%
6526
19.1%
nos 6526
19.1%
fresh/frozen 1560
 
4.6%
fixed 801
 
2.3%
ns 624
 
1.8%
fine 72
 
0.2%
needle 72
 
0.2%
aspirate 72
 
0.2%
Other values (9) 73
 
0.2%

Most occurring characters

ValueCountFrequency (%)
r 38754
15.2%
e 32705
12.8%
s 19429
 
7.6%
u 17816
 
7.0%
y 17793
 
7.0%
g 17776
 
7.0%
- 17152
 
6.7%
14721
 
5.8%
f 14595
 
5.7%
i 11576
 
4.5%
Other values (18) 52328
20.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 254645
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
r 38754
15.2%
e 32705
12.8%
s 19429
 
7.6%
u 17816
 
7.0%
y 17793
 
7.0%
g 17776
 
7.0%
- 17152
 
6.7%
14721
 
5.8%
f 14595
 
5.7%
i 11576
 
4.5%
Other values (18) 52328
20.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 254645
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
r 38754
15.2%
e 32705
12.8%
s 19429
 
7.6%
u 17816
 
7.0%
y 17793
 
7.0%
g 17776
 
7.0%
- 17152
 
6.7%
14721
 
5.8%
f 14595
 
5.7%
i 11576
 
4.5%
Other values (18) 52328
20.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 254645
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
r 38754
15.2%
e 32705
12.8%
s 19429
 
7.6%
u 17816
 
7.0%
y 17793
 
7.0%
g 17776
 
7.0%
- 17152
 
6.7%
14721
 
5.8%
f 14595
 
5.7%
i 11576
 
4.5%
Other values (18) 52328
20.5%

INDIVIDUAL_ID
Real number (ℝ)

High correlation 

Distinct18249
Distinct (%)94.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1618704.4
Minimum620961
Maximum2659684
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size302.7 KiB
2025-07-15T00:50:45.836062image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum620961
5-th percentile722926.55
Q11099513.8
median1729265.5
Q32146195.2
95-th percentile2596013
Maximum2659684
Range2038723
Interquartile range (IQR)1046681.5

Descriptive statistics

Standard deviation576460.23
Coefficient of variation (CV)0.35612446
Kurtosis-1.1041569
Mean1618704.4
Median Absolute Deviation (MAD)466146
Skewness-0.015385481
Sum3.1357542 × 1010
Variance3.323064 × 1011
MonotonicityNot monotonic
2025-07-15T00:50:45.943482image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1181721 17
 
0.1%
1464316 10
 
0.1%
1099583 10
 
0.1%
1037922 8
 
< 0.1%
2197216 8
 
< 0.1%
1099581 8
 
< 0.1%
887717 7
 
< 0.1%
1053243 7
 
< 0.1%
887714 7
 
< 0.1%
1111312 7
 
< 0.1%
Other values (18239) 19283
99.5%
ValueCountFrequency (%)
620961 1
< 0.1%
665613 1
< 0.1%
665614 1
< 0.1%
665615 1
< 0.1%
665616 1
< 0.1%
665617 1
< 0.1%
665618 1
< 0.1%
665619 1
< 0.1%
665620 1
< 0.1%
665621 1
< 0.1%
ValueCountFrequency (%)
2659684 1
< 0.1%
2659683 1
< 0.1%
2659682 1
< 0.1%
2645583 1
< 0.1%
2645582 1
< 0.1%
2642377 1
< 0.1%
2642376 1
< 0.1%
2642375 1
< 0.1%
2642374 1
< 0.1%
2642373 1
< 0.1%

WHOLE_GENOME_SCREEN
Boolean

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size170.3 KiB
False
19372 
ValueCountFrequency (%)
False 19372
100.0%
2025-07-15T00:50:46.008743image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

WHOLE_EXOME_SCREEN
Boolean

High correlation  Imbalance 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size170.3 KiB
False
19316 
True
 
56
ValueCountFrequency (%)
False 19316
99.7%
True 56
 
0.3%
2025-07-15T00:50:46.037215image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

TARGETED_SCREEN
Boolean

High correlation  Imbalance 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size170.3 KiB
True
19309 
False
 
63
ValueCountFrequency (%)
True 19309
99.7%
False 63
 
0.3%
2025-07-15T00:50:46.070523image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

RNASEQ_SCREEN
Boolean

High correlation  Imbalance 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size170.3 KiB
False
19365 
True
 
7
ValueCountFrequency (%)
False 19365
> 99.9%
True 7
 
< 0.1%
2025-07-15T00:50:46.103803image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

REARRANGEMENT_SCREEN
Boolean

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size170.3 KiB
False
19372 
ValueCountFrequency (%)
False 19372
100.0%
2025-07-15T00:50:46.136566image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

TUMOUR_SOURCE
Categorical

High correlation 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
NS
12576 
primary
5693 
recurrent
 
621
metastasis
 
477
secondary
 
5

Length

Max length10
Median length2
Mean length3.8925769
Min length2

Characters and Unicode

Total characters75407
Distinct characters16
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowprimary
2nd rowNS
3rd rowprimary
4th rowprimary
5th rowprimary

Common Values

ValueCountFrequency (%)
NS 12576
64.9%
primary 5693
29.4%
recurrent 621
 
3.2%
metastasis 477
 
2.5%
secondary 5
 
< 0.1%

Length

2025-07-15T00:50:46.196840image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T00:50:46.261445image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
ns 12576
64.9%
primary 5693
29.4%
recurrent 621
 
3.2%
metastasis 477
 
2.5%
secondary 5
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
r 13254
17.6%
N 12576
16.7%
S 12576
16.7%
a 6652
8.8%
m 6170
8.2%
i 6170
8.2%
y 5698
7.6%
p 5693
7.5%
e 1724
 
2.3%
t 1575
 
2.1%
Other values (6) 3319
 
4.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 75407
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
r 13254
17.6%
N 12576
16.7%
S 12576
16.7%
a 6652
8.8%
m 6170
8.2%
i 6170
8.2%
y 5698
7.6%
p 5693
7.5%
e 1724
 
2.3%
t 1575
 
2.1%
Other values (6) 3319
 
4.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 75407
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
r 13254
17.6%
N 12576
16.7%
S 12576
16.7%
a 6652
8.8%
m 6170
8.2%
i 6170
8.2%
y 5698
7.6%
p 5693
7.5%
e 1724
 
2.3%
t 1575
 
2.1%
Other values (6) 3319
 
4.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 75407
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
r 13254
17.6%
N 12576
16.7%
S 12576
16.7%
a 6652
8.8%
m 6170
8.2%
i 6170
8.2%
y 5698
7.6%
p 5693
7.5%
e 1724
 
2.3%
t 1575
 
2.1%
Other values (6) 3319
 
4.4%

NORMAL_TISSUE_TESTED
Boolean

High correlation  Missing 

Distinct2
Distinct (%)0.1%
Missing17683
Missing (%)91.3%
Memory size189.2 KiB
True
 
1115
False
 
574
(Missing)
17683 
ValueCountFrequency (%)
True 1115
 
5.8%
False 574
 
3.0%
(Missing) 17683
91.3%
2025-07-15T00:50:46.313359image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

GENDER
Categorical

High correlation 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.2 MiB
u
15685 
m
2050 
f
1637 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters19372
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowu
2nd rowu
3rd rowu
4th rowm
5th rowu

Common Values

ValueCountFrequency (%)
u 15685
81.0%
m 2050
 
10.6%
f 1637
 
8.5%

Length

2025-07-15T00:50:46.371451image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T00:50:46.620406image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
u 15685
81.0%
m 2050
 
10.6%
f 1637
 
8.5%

Most occurring characters

ValueCountFrequency (%)
u 15685
81.0%
m 2050
 
10.6%
f 1637
 
8.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 19372
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
u 15685
81.0%
m 2050
 
10.6%
f 1637
 
8.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 19372
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
u 15685
81.0%
m 2050
 
10.6%
f 1637
 
8.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 19372
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
u 15685
81.0%
m 2050
 
10.6%
f 1637
 
8.5%

AGE
Real number (ℝ)

Missing 

Distinct97
Distinct (%)2.7%
Missing15740
Missing (%)81.3%
Infinite0
Infinite (%)0.0%
Mean58.743868
Minimum0.33
Maximum96
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size302.7 KiB
2025-07-15T00:50:46.707036image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum0.33
5-th percentile30
Q150
median61
Q369
95-th percentile80
Maximum96
Range95.67
Interquartile range (IQR)19

Descriptive statistics

Standard deviation15.274116
Coefficient of variation (CV)0.26001209
Kurtosis0.64834094
Mean58.743868
Median Absolute Deviation (MAD)9
Skewness-0.72165915
Sum213357.73
Variance233.29862
MonotonicityNot monotonic
2025-07-15T00:50:46.809412image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
68 125
 
0.6%
65 118
 
0.6%
64 116
 
0.6%
66 112
 
0.6%
59 109
 
0.6%
54 108
 
0.6%
60 105
 
0.5%
62 102
 
0.5%
67 101
 
0.5%
52 99
 
0.5%
Other values (87) 2537
 
13.1%
(Missing) 15740
81.3%
ValueCountFrequency (%)
0.33 1
 
< 0.1%
5 1
 
< 0.1%
6.9 1
 
< 0.1%
7 1
 
< 0.1%
8 2
 
< 0.1%
9 2
 
< 0.1%
10 8
< 0.1%
11 6
< 0.1%
12 13
0.1%
13 6
< 0.1%
ValueCountFrequency (%)
96 1
 
< 0.1%
94 1
 
< 0.1%
93 1
 
< 0.1%
92 6
< 0.1%
91 2
 
< 0.1%
90 2
 
< 0.1%
89 7
< 0.1%
88 9
< 0.1%
87 5
 
< 0.1%
86 14
0.1%

THERAPY_RELATIONSHIP
Text

Missing 

Distinct52
Distinct (%)1.7%
Missing16397
Missing (%)84.6%
Memory size941.0 KiB
2025-07-15T00:50:46.964232image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

Max length96
Median length39
Mean length38.43563
Min length27

Characters and Unicode

Total characters114346
Distinct characters49
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique32 ?
Unique (%)1.1%

Sample

1st rowSample analysed before imatinib therapy
2nd rowSample analysed before imatinib therapy
3rd rowSample analysed during imatinib therapy
4th rowSample analysed during imatinib therapy
5th rowSample taken before imatinib therapy
ValueCountFrequency (%)
sample 2975
19.6%
therapy 2965
19.5%
imatinib 2938
19.4%
analysed 2031
13.4%
before 2020
13.3%
taken 944
 
6.2%
after 734
 
4.8%
during 233
 
1.5%
and 53
 
0.3%
months 34
 
0.2%
Other values (54) 240
 
1.6%
2025-07-15T00:50:47.219001image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 14750
12.9%
e 13829
12.1%
12193
10.7%
i 9219
 
8.1%
t 7781
 
6.8%
n 6391
 
5.6%
r 6012
 
5.3%
m 5977
 
5.2%
p 5960
 
5.2%
l 5035
 
4.4%
Other values (39) 27199
23.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 114346
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 14750
12.9%
e 13829
12.1%
12193
10.7%
i 9219
 
8.1%
t 7781
 
6.8%
n 6391
 
5.6%
r 6012
 
5.3%
m 5977
 
5.2%
p 5960
 
5.2%
l 5035
 
4.4%
Other values (39) 27199
23.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 114346
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 14750
12.9%
e 13829
12.1%
12193
10.7%
i 9219
 
8.1%
t 7781
 
6.8%
n 6391
 
5.6%
r 6012
 
5.3%
m 5977
 
5.2%
p 5960
 
5.2%
l 5035
 
4.4%
Other values (39) 27199
23.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 114346
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 14750
12.9%
e 13829
12.1%
12193
10.7%
i 9219
 
8.1%
t 7781
 
6.8%
n 6391
 
5.6%
r 6012
 
5.3%
m 5977
 
5.2%
p 5960
 
5.2%
l 5035
 
4.4%
Other values (39) 27199
23.8%

SAMPLE_DIFFERENTIATOR
Text

Missing 

Distinct38
Distinct (%)55.1%
Missing19303
Missing (%)99.6%
Memory size761.4 KiB
2025-07-15T00:50:47.409985image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

Max length86
Median length61
Mean length44.42029
Min length25

Characters and Unicode

Total characters3065
Distinct characters54
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique29 ?
Unique (%)42.0%

Sample

1st rowSample from spindle cell component
2nd rowSample from the spindle cell component
3rd rowSample from dedifferentiated anaplastic component
4th rowPrimary pattern (Sarcomatous spindle)
5th rowSample from dedifferentiated epitheloid/pleomorphic component
ValueCountFrequency (%)
sample 53
 
13.6%
from 43
 
11.1%
component 43
 
11.1%
cell 29
 
7.5%
spindle 24
 
6.2%
the 13
 
3.3%
of 13
 
3.3%
epithelioid 12
 
3.1%
pattern 12
 
3.1%
dedifferentiated 11
 
2.8%
Other values (64) 136
35.0%
2025-07-15T00:50:47.705287image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 335
 
10.9%
321
 
10.5%
o 239
 
7.8%
n 199
 
6.5%
a 189
 
6.2%
t 184
 
6.0%
l 180
 
5.9%
p 176
 
5.7%
m 176
 
5.7%
i 166
 
5.4%
Other values (44) 900
29.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3065
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 335
 
10.9%
321
 
10.5%
o 239
 
7.8%
n 199
 
6.5%
a 189
 
6.2%
t 184
 
6.0%
l 180
 
5.9%
p 176
 
5.7%
m 176
 
5.7%
i 166
 
5.4%
Other values (44) 900
29.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3065
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 335
 
10.9%
321
 
10.5%
o 239
 
7.8%
n 199
 
6.5%
a 189
 
6.2%
t 184
 
6.0%
l 180
 
5.9%
p 176
 
5.7%
m 176
 
5.7%
i 166
 
5.4%
Other values (44) 900
29.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3065
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 335
 
10.9%
321
 
10.5%
o 239
 
7.8%
n 199
 
6.5%
a 189
 
6.2%
t 184
 
6.0%
l 180
 
5.9%
p 176
 
5.7%
m 176
 
5.7%
i 166
 
5.4%
Other values (44) 900
29.4%

MUTATION_ALLELE_SPECIFICATION
Categorical

High correlation  Missing 

Distinct9
Distinct (%)8.5%
Missing19266
Missing (%)99.5%
Memory size1.3 MiB
Secondary mutations located on same chromosome as the primary mutation
62 
Mutations located on single chromosome
19 
Secondary mutations located on same chromosome as the primary mutation. Primary mutation p.V560D and e13 frameshift insertion located on different chromosomes
18 
KIT mutations located on single chromosome
 
2
KIT p.V559G primary mutation occurs on same chromosome as p.Y578C
 
1
Other values (4)
 
4

Length

Max length158
Median length70
Mean length78.915094
Min length38

Characters and Unicode

Total characters8365
Distinct characters48
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)4.7%

Sample

1st rowSecondary mutations located on same chromosome as the primary mutation
2nd rowMutations located on single chromosome
3rd rowSecondary mutations located on same chromosome as the primary mutation
4th rowSecondary mutations located on same chromosome as the primary mutation
5th rowMutations located on single chromosome

Common Values

ValueCountFrequency (%)
Secondary mutations located on same chromosome as the primary mutation 62
 
0.3%
Mutations located on single chromosome 19
 
0.1%
Secondary mutations located on same chromosome as the primary mutation. Primary mutation p.V560D and e13 frameshift insertion located on different chromosomes 18
 
0.1%
KIT mutations located on single chromosome 2
 
< 0.1%
KIT p.V559G primary mutation occurs on same chromosome as p.Y578C 1
 
< 0.1%
PDGFRA mutations located on different chromosomes 1
 
< 0.1%
Not known if KIT p.W557G primary mutation occurs on same chromosome as p.V569_Y578del 1
 
< 0.1%
KIT p.V559G primary mutation occurs on same chromosome as p.Y578C, and also with p.D579del 1
 
< 0.1%
Primary mutation p.V560D and e13 frameshift insertion located on different chromosomes 1
 
< 0.1%
(Missing) 19266
99.5%

Length

2025-07-15T00:50:47.797077image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T00:50:47.880824image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
on 124
10.7%
located 121
10.4%
chromosome 104
9.0%
mutations 102
8.8%
primary 102
8.8%
mutation 102
8.8%
same 83
7.2%
as 83
7.2%
secondary 80
6.9%
the 80
6.9%
Other values (21) 178
15.4%

Most occurring characters

ValueCountFrequency (%)
1053
12.6%
o 926
11.1%
a 713
 
8.5%
t 669
 
8.0%
m 637
 
7.6%
e 608
 
7.3%
n 509
 
6.1%
s 475
 
5.7%
r 469
 
5.6%
i 406
 
4.9%
Other values (38) 1900
22.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 8365
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1053
12.6%
o 926
11.1%
a 713
 
8.5%
t 669
 
8.0%
m 637
 
7.6%
e 608
 
7.3%
n 509
 
6.1%
s 475
 
5.7%
r 469
 
5.6%
i 406
 
4.9%
Other values (38) 1900
22.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 8365
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1053
12.6%
o 926
11.1%
a 713
 
8.5%
t 669
 
8.0%
m 637
 
7.6%
e 608
 
7.3%
n 509
 
6.1%
s 475
 
5.7%
r 469
 
5.6%
i 406
 
4.9%
Other values (38) 1900
22.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 8365
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1053
12.6%
o 926
11.1%
a 713
 
8.5%
t 669
 
8.0%
m 637
 
7.6%
e 608
 
7.3%
n 509
 
6.1%
s 475
 
5.7%
r 469
 
5.6%
i 406
 
4.9%
Other values (38) 1900
22.7%

MSI
Categorical

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
Unknown
19372 

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters135604
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUnknown
2nd rowUnknown
3rd rowUnknown
4th rowUnknown
5th rowUnknown

Common Values

ValueCountFrequency (%)
Unknown 19372
100.0%

Length

2025-07-15T00:50:48.015590image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T00:50:48.060349image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
unknown 19372
100.0%

Most occurring characters

ValueCountFrequency (%)
n 58116
42.9%
U 19372
 
14.3%
k 19372
 
14.3%
o 19372
 
14.3%
w 19372
 
14.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 135604
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
n 58116
42.9%
U 19372
 
14.3%
k 19372
 
14.3%
o 19372
 
14.3%
w 19372
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 135604
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
n 58116
42.9%
U 19372
 
14.3%
k 19372
 
14.3%
o 19372
 
14.3%
w 19372
 
14.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 135604
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
n 58116
42.9%
U 19372
 
14.3%
k 19372
 
14.3%
o 19372
 
14.3%
w 19372
 
14.3%

AVERAGE_PLOIDY
Unsupported

Missing  Rejected  Unsupported 

Missing19372
Missing (%)100.0%
Memory size302.7 KiB

SAMPLE_REMARK
Unsupported

Missing  Rejected  Unsupported 

Missing19372
Missing (%)100.0%
Memory size302.7 KiB

DRUG_RESPONSE
Text

Missing 

Distinct66
Distinct (%)3.6%
Missing17537
Missing (%)90.5%
Memory size886.8 KiB
2025-07-15T00:50:48.210234image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

Max length389
Median length183
Mean length47.583106
Min length27

Characters and Unicode

Total characters87315
Distinct characters54
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique30 ?
Unique (%)1.6%

Sample

1st rowImatinib clinical primary non response (local progression)
2nd rowImatinib clinical resistant recurrence;Sunitinib clinical response - not further specified
3rd rowImatinib clinical resistant recurrence
4th rowImatinib clinical resistant recurrence
5th rowImatinib clinical response - not further specified
ValueCountFrequency (%)
imatinib 1822
16.6%
clinical 1797
16.3%
response 1088
9.9%
resistant 796
7.2%
recurrence 779
7.1%
695
 
6.3%
not 672
 
6.1%
further 672
 
6.1%
specified 671
 
6.1%
non 372
 
3.4%
Other values (108) 1628
14.8%
2025-07-15T00:50:48.498921image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
i 10784
12.4%
9181
10.5%
e 8420
9.6%
n 8253
9.5%
r 6605
 
7.6%
c 5973
 
6.8%
a 5532
 
6.3%
t 5481
 
6.3%
s 5172
 
5.9%
l 3986
 
4.6%
Other values (44) 17928
20.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 87315
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
i 10784
12.4%
9181
10.5%
e 8420
9.6%
n 8253
9.5%
r 6605
 
7.6%
c 5973
 
6.8%
a 5532
 
6.3%
t 5481
 
6.3%
s 5172
 
5.9%
l 3986
 
4.6%
Other values (44) 17928
20.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 87315
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
i 10784
12.4%
9181
10.5%
e 8420
9.6%
n 8253
9.5%
r 6605
 
7.6%
c 5973
 
6.8%
a 5532
 
6.3%
t 5481
 
6.3%
s 5172
 
5.9%
l 3986
 
4.6%
Other values (44) 17928
20.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 87315
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
i 10784
12.4%
9181
10.5%
e 8420
9.6%
n 8253
9.5%
r 6605
 
7.6%
c 5973
 
6.8%
a 5532
 
6.3%
t 5481
 
6.3%
s 5172
 
5.9%
l 3986
 
4.6%
Other values (44) 17928
20.5%

GRADE
Categorical

High correlation  Imbalance  Missing 

Distinct3
Distinct (%)2.1%
Missing19231
Missing (%)99.3%
Memory size1.3 MiB
Some Grade data are given in publication
138 
3
 
2
2
 
1

Length

Max length40
Median length40
Mean length39.170213
Min length1

Characters and Unicode

Total characters5523
Distinct characters21
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.7%

Sample

1st rowSome Grade data are given in publication
2nd rowSome Grade data are given in publication
3rd rowSome Grade data are given in publication
4th rowSome Grade data are given in publication
5th rowSome Grade data are given in publication

Common Values

ValueCountFrequency (%)
Some Grade data are given in publication 138
 
0.7%
3 2
 
< 0.1%
2 1
 
< 0.1%
(Missing) 19231
99.3%

Length

2025-07-15T00:50:48.588793image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T00:50:48.647066image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
some 138
14.2%
grade 138
14.2%
data 138
14.2%
are 138
14.2%
given 138
14.2%
in 138
14.2%
publication 138
14.2%
3 2
 
0.2%
2 1
 
0.1%

Most occurring characters

ValueCountFrequency (%)
828
15.0%
a 690
12.5%
e 552
10.0%
i 552
10.0%
n 414
 
7.5%
t 276
 
5.0%
r 276
 
5.0%
o 276
 
5.0%
d 276
 
5.0%
m 138
 
2.5%
Other values (11) 1245
22.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 5523
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
828
15.0%
a 690
12.5%
e 552
10.0%
i 552
10.0%
n 414
 
7.5%
t 276
 
5.0%
r 276
 
5.0%
o 276
 
5.0%
d 276
 
5.0%
m 138
 
2.5%
Other values (11) 1245
22.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 5523
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
828
15.0%
a 690
12.5%
e 552
10.0%
i 552
10.0%
n 414
 
7.5%
t 276
 
5.0%
r 276
 
5.0%
o 276
 
5.0%
d 276
 
5.0%
m 138
 
2.5%
Other values (11) 1245
22.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 5523
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
828
15.0%
a 690
12.5%
e 552
10.0%
i 552
10.0%
n 414
 
7.5%
t 276
 
5.0%
r 276
 
5.0%
o 276
 
5.0%
d 276
 
5.0%
m 138
 
2.5%
Other values (11) 1245
22.5%
Distinct29
Distinct (%)78.4%
Missing19335
Missing (%)99.8%
Memory size758.8 KiB
2025-07-15T00:50:48.793828image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

Max length133
Median length70
Mean length33
Min length2

Characters and Unicode

Total characters1221
Distinct characters42
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique23 ?
Unique (%)62.2%

Sample

1st row33 months after initial diagnosis
2nd row3 years after primary diagnosis
3rd row3 years after primary diagnosis
4th row57 months after nilotinib therapy started
5th row24 months after initial diagnosis
ValueCountFrequency (%)
after 27
 
14.3%
months 21
 
11.1%
diagnosis 15
 
7.9%
therapy 11
 
5.8%
started 9
 
4.8%
years 8
 
4.2%
initial 8
 
4.2%
imatinib 6
 
3.2%
treatment 4
 
2.1%
15 4
 
2.1%
Other values (45) 76
40.2%
2025-07-15T00:50:49.051545image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
153
12.5%
t 120
 
9.8%
i 114
 
9.3%
a 104
 
8.5%
s 86
 
7.0%
r 83
 
6.8%
e 83
 
6.8%
n 82
 
6.7%
o 54
 
4.4%
m 36
 
2.9%
Other values (32) 306
25.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1221
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
153
12.5%
t 120
 
9.8%
i 114
 
9.3%
a 104
 
8.5%
s 86
 
7.0%
r 83
 
6.8%
e 83
 
6.8%
n 82
 
6.7%
o 54
 
4.4%
m 36
 
2.9%
Other values (32) 306
25.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1221
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
153
12.5%
t 120
 
9.8%
i 114
 
9.3%
a 104
 
8.5%
s 86
 
7.0%
r 83
 
6.8%
e 83
 
6.8%
n 82
 
6.7%
o 54
 
4.4%
m 36
 
2.9%
Other values (32) 306
25.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1221
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
153
12.5%
t 120
 
9.8%
i 114
 
9.3%
a 104
 
8.5%
s 86
 
7.0%
r 83
 
6.8%
e 83
 
6.8%
n 82
 
6.7%
o 54
 
4.4%
m 36
 
2.9%
Other values (32) 306
25.1%

STAGE
Categorical

High correlation  Imbalance  Missing 

Distinct7
Distinct (%)2.5%
Missing19097
Missing (%)98.6%
Memory size1.3 MiB
Some Stage data are given in publication
244 
T3
 
16
IV
 
5
T4
 
5
T2
 
2
Other values (2)
 
3

Length

Max length40
Median length40
Mean length35.716364
Min length2

Characters and Unicode

Total characters9822
Distinct characters25
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.4%

Sample

1st rowSome Stage data are given in publication
2nd rowSome Stage data are given in publication
3rd rowSome Stage data are given in publication
4th rowSome Stage data are given in publication
5th rowSome Stage data are given in publication

Common Values

ValueCountFrequency (%)
Some Stage data are given in publication 244
 
1.3%
T3 16
 
0.1%
IV 5
 
< 0.1%
T4 5
 
< 0.1%
T2 2
 
< 0.1%
II 2
 
< 0.1%
IA 1
 
< 0.1%
(Missing) 19097
98.6%

Length

2025-07-15T00:50:49.137547image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T00:50:49.204761image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
some 244
14.0%
stage 244
14.0%
data 244
14.0%
are 244
14.0%
given 244
14.0%
in 244
14.0%
publication 244
14.0%
t3 16
 
0.9%
iv 5
 
0.3%
t4 5
 
0.3%
Other values (3) 5
 
0.3%

Most occurring characters

ValueCountFrequency (%)
1464
14.9%
a 1220
12.4%
e 976
9.9%
i 976
9.9%
t 732
 
7.5%
n 732
 
7.5%
o 488
 
5.0%
g 488
 
5.0%
S 488
 
5.0%
d 244
 
2.5%
Other values (15) 2014
20.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 9822
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1464
14.9%
a 1220
12.4%
e 976
9.9%
i 976
9.9%
t 732
 
7.5%
n 732
 
7.5%
o 488
 
5.0%
g 488
 
5.0%
S 488
 
5.0%
d 244
 
2.5%
Other values (15) 2014
20.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 9822
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1464
14.9%
a 1220
12.4%
e 976
9.9%
i 976
9.9%
t 732
 
7.5%
n 732
 
7.5%
o 488
 
5.0%
g 488
 
5.0%
S 488
 
5.0%
d 244
 
2.5%
Other values (15) 2014
20.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 9822
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1464
14.9%
a 1220
12.4%
e 976
9.9%
i 976
9.9%
t 732
 
7.5%
n 732
 
7.5%
o 488
 
5.0%
g 488
 
5.0%
S 488
 
5.0%
d 244
 
2.5%
Other values (15) 2014
20.5%

CYTOGENETICS
Categorical

High correlation  Missing 

Distinct3
Distinct (%)7.1%
Missing19330
Missing (%)99.8%
Memory size1.3 MiB
del(22)
35 
ALK fusion negative
Normal
 
3

Length

Max length19
Median length7
Mean length8.0714286
Min length6

Characters and Unicode

Total characters339
Distinct characters23
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNormal
2nd rowdel(22)
3rd rowdel(22)
4th rowdel(22)
5th rowdel(22)

Common Values

ValueCountFrequency (%)
del(22) 35
 
0.2%
ALK fusion negative 4
 
< 0.1%
Normal 3
 
< 0.1%
(Missing) 19330
99.8%

Length

2025-07-15T00:50:49.292088image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T00:50:49.349160image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
del(22 35
70.0%
alk 4
 
8.0%
fusion 4
 
8.0%
negative 4
 
8.0%
normal 3
 
6.0%

Most occurring characters

ValueCountFrequency (%)
2 70
20.6%
e 43
12.7%
l 38
11.2%
d 35
10.3%
( 35
10.3%
) 35
10.3%
8
 
2.4%
n 8
 
2.4%
i 8
 
2.4%
a 7
 
2.1%
Other values (13) 52
15.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 339
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
2 70
20.6%
e 43
12.7%
l 38
11.2%
d 35
10.3%
( 35
10.3%
) 35
10.3%
8
 
2.4%
n 8
 
2.4%
i 8
 
2.4%
a 7
 
2.1%
Other values (13) 52
15.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 339
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
2 70
20.6%
e 43
12.7%
l 38
11.2%
d 35
10.3%
( 35
10.3%
) 35
10.3%
8
 
2.4%
n 8
 
2.4%
i 8
 
2.4%
a 7
 
2.1%
Other values (13) 52
15.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 339
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
2 70
20.6%
e 43
12.7%
l 38
11.2%
d 35
10.3%
( 35
10.3%
) 35
10.3%
8
 
2.4%
n 8
 
2.4%
i 8
 
2.4%
a 7
 
2.1%
Other values (13) 52
15.3%

METASTATIC_SITE
Categorical

High correlation  Missing 

Distinct44
Distinct (%)9.6%
Missing18912
Missing (%)97.6%
Memory size1.3 MiB
liver
140 
peritoneum
128 
NS
75 
omentum
 
14
stomach
 
12
Other values (39)
91 

Length

Max length43
Median length22
Mean length6.9391304
Min length2

Characters and Unicode

Total characters3192
Distinct characters28
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique23 ?
Unique (%)5.0%

Sample

1st rowliver
2nd rowliver
3rd rowNS
4th rowperitoneum
5th rowstomach

Common Values

ValueCountFrequency (%)
liver 140
 
0.7%
peritoneum 128
 
0.7%
NS 75
 
0.4%
omentum 14
 
0.1%
stomach 12
 
0.1%
mesentery 11
 
0.1%
abdomen 10
 
0.1%
colon 7
 
< 0.1%
pelvis 5
 
< 0.1%
lymph node 4
 
< 0.1%
Other values (34) 54
 
0.3%
(Missing) 18912
97.6%

Length

2025-07-15T00:50:49.425325image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
liver 140
28.9%
peritoneum 128
26.4%
ns 75
15.5%
omentum 15
 
3.1%
abdomen 13
 
2.7%
stomach 12
 
2.5%
mesentery 11
 
2.3%
colon 7
 
1.4%
skin 6
 
1.2%
pelvis 5
 
1.0%
Other values (38) 72
14.9%

Most occurring characters

ValueCountFrequency (%)
e 517
16.2%
i 328
10.3%
r 313
9.8%
n 224
 
7.0%
m 222
 
7.0%
o 206
 
6.5%
t 205
 
6.4%
l 201
 
6.3%
u 163
 
5.1%
p 163
 
5.1%
Other values (18) 650
20.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3192
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 517
16.2%
i 328
10.3%
r 313
9.8%
n 224
 
7.0%
m 222
 
7.0%
o 206
 
6.5%
t 205
 
6.4%
l 201
 
6.3%
u 163
 
5.1%
p 163
 
5.1%
Other values (18) 650
20.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3192
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 517
16.2%
i 328
10.3%
r 313
9.8%
n 224
 
7.0%
m 222
 
7.0%
o 206
 
6.5%
t 205
 
6.4%
l 201
 
6.3%
u 163
 
5.1%
p 163
 
5.1%
Other values (18) 650
20.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3192
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 517
16.2%
i 328
10.3%
r 313
9.8%
n 224
 
7.0%
m 222
 
7.0%
o 206
 
6.5%
t 205
 
6.4%
l 201
 
6.3%
u 163
 
5.1%
p 163
 
5.1%
Other values (18) 650
20.4%

TUMOUR_REMARK
Categorical

High correlation  Missing 

Distinct44
Distinct (%)4.7%
Missing18428
Missing (%)95.1%
Memory size1.4 MiB
Sample has a KIT exon 11 mutation (Sample has a Wildtype KIT exon 11)
353 
Tumour preselected as being negative for mutations in KIT e9, e11, e13, e17 and PDGFRA e12 and e18. (Tumour preselected as being negative for mutations in KIT e9, e11, e13, e17 and PDGFRA e12 and e18.)
200 
Sample has Wildtype KIT exons 9 and 11 (Sample has Wildtype KIT exons 9 and 11)
94 
Sample has a KIT exon 11 mutation (Sample has a KIT exon 11 mutation)
60 
Advanced GIST
58 
Other values (39)
179 

Length

Max length201
Median length175
Mean length91.681144
Min length13

Characters and Unicode

Total characters86547
Distinct characters56
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique27 ?
Unique (%)2.9%

Sample

1st rowTumour preselected as being negative for mutations in KIT e9, e11, e13, e17 and PDGFRA e12 and e18. (Tumour preselected as being negative for mutations in KIT e9, e11, e13, e17 and PDGFRA e12 and e18.)
2nd rowSample has a KIT exon 11 mutation (Sample has a Wildtype KIT exon 11)
3rd rowSample has a KIT exon 11 mutation (Sample has a Wildtype KIT exon 11)
4th rowSample has a KIT exon 11 mutation (Sample has a Wildtype KIT exon 11)
5th rowSample has a KIT exon 11 mutation (Sample has a Wildtype KIT exon 11)

Common Values

ValueCountFrequency (%)
Sample has a KIT exon 11 mutation (Sample has a Wildtype KIT exon 11) 353
 
1.8%
Tumour preselected as being negative for mutations in KIT e9, e11, e13, e17 and PDGFRA e12 and e18. (Tumour preselected as being negative for mutations in KIT e9, e11, e13, e17 and PDGFRA e12 and e18.) 200
 
1.0%
Sample has Wildtype KIT exons 9 and 11 (Sample has Wildtype KIT exons 9 and 11) 94
 
0.5%
Sample has a KIT exon 11 mutation (Sample has a KIT exon 11 mutation) 60
 
0.3%
Advanced GIST 58
 
0.3%
Succinate B dehydrogenase positive tumour 38
 
0.2%
multifocal (multifocal) 24
 
0.1%
At time of surgery (GIST ruptured before or during surgery) 21
 
0.1%
Juvenile JPA (KIT negative) 20
 
0.1%
Multicentric GIST (Multicentric GIST) 12
 
0.1%
Other values (34) 64
 
0.3%
(Missing) 18428
95.1%

Length

2025-07-15T00:50:49.525645image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
kit 1434
 
9.0%
has 1016
 
6.4%
sample 1014
 
6.4%
11 1014
 
6.4%
and 998
 
6.3%
a 828
 
5.2%
exon 826
 
5.2%
wildtype 541
 
3.4%
tumour 504
 
3.2%
mutation 475
 
3.0%
Other values (107) 7235
45.5%

Most occurring characters

ValueCountFrequency (%)
14942
17.3%
e 8712
 
10.1%
a 5966
 
6.9%
n 4524
 
5.2%
1 4452
 
5.1%
t 3697
 
4.3%
i 3307
 
3.8%
o 3282
 
3.8%
s 2741
 
3.2%
m 2632
 
3.0%
Other values (46) 32292
37.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 86547
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
14942
17.3%
e 8712
 
10.1%
a 5966
 
6.9%
n 4524
 
5.2%
1 4452
 
5.1%
t 3697
 
4.3%
i 3307
 
3.8%
o 3282
 
3.8%
s 2741
 
3.2%
m 2632
 
3.0%
Other values (46) 32292
37.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 86547
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
14942
17.3%
e 8712
 
10.1%
a 5966
 
6.9%
n 4524
 
5.2%
1 4452
 
5.1%
t 3697
 
4.3%
i 3307
 
3.8%
o 3282
 
3.8%
s 2741
 
3.2%
m 2632
 
3.0%
Other values (46) 32292
37.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 86547
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
14942
17.3%
e 8712
 
10.1%
a 5966
 
6.9%
n 4524
 
5.2%
1 4452
 
5.1%
t 3697
 
4.3%
i 3307
 
3.8%
o 3282
 
3.8%
s 2741
 
3.2%
m 2632
 
3.0%
Other values (46) 32292
37.3%

ETHNICITY
Categorical

High correlation  Missing 

Distinct14
Distinct (%)0.8%
Missing17527
Missing (%)90.5%
Memory size1.3 MiB
Chinese
724 
Korean
333 
Slovakian
278 
Japanese
168 
Italian
83 
Other values (9)
259 

Length

Max length10
Median length9
Mean length7.4243902
Min length5

Characters and Unicode

Total characters13698
Distinct characters31
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st rowTaiwanese
2nd rowSlovakian
3rd rowCaucasian
4th rowItalian
5th rowSlovakian

Common Values

ValueCountFrequency (%)
Chinese 724
 
3.7%
Korean 333
 
1.7%
Slovakian 278
 
1.4%
Japanese 168
 
0.9%
Italian 83
 
0.4%
Portuguese 78
 
0.4%
Caucasian 42
 
0.2%
Panamanian 39
 
0.2%
Taiwanese 38
 
0.2%
Greek 38
 
0.2%
Other values (4) 24
 
0.1%
(Missing) 17527
90.5%

Length

2025-07-15T00:50:49.611444image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
chinese 724
39.2%
korean 333
18.0%
slovakian 278
 
15.1%
japanese 168
 
9.1%
italian 83
 
4.5%
portuguese 78
 
4.2%
caucasian 42
 
2.3%
panamanian 39
 
2.1%
taiwanese 38
 
2.1%
greek 38
 
2.1%
Other values (4) 24
 
1.3%

Most occurring characters

ValueCountFrequency (%)
e 2425
17.7%
n 1807
13.2%
a 1774
13.0%
i 1228
9.0%
s 1069
7.8%
C 766
 
5.6%
h 724
 
5.3%
o 690
 
5.0%
r 449
 
3.3%
l 364
 
2.7%
Other values (21) 2402
17.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 13698
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 2425
17.7%
n 1807
13.2%
a 1774
13.0%
i 1228
9.0%
s 1069
7.8%
C 766
 
5.6%
h 724
 
5.3%
o 690
 
5.0%
r 449
 
3.3%
l 364
 
2.7%
Other values (21) 2402
17.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 13698
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 2425
17.7%
n 1807
13.2%
a 1774
13.0%
i 1228
9.0%
s 1069
7.8%
C 766
 
5.6%
h 724
 
5.3%
o 690
 
5.0%
r 449
 
3.3%
l 364
 
2.7%
Other values (21) 2402
17.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 13698
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 2425
17.7%
n 1807
13.2%
a 1774
13.0%
i 1228
9.0%
s 1069
7.8%
C 766
 
5.6%
h 724
 
5.3%
o 690
 
5.0%
r 449
 
3.3%
l 364
 
2.7%
Other values (21) 2402
17.5%

ENVIRONMENTAL_VARIABLES
Categorical

High correlation  Missing 

Distinct3
Distinct (%)3.4%
Missing19284
Missing (%)99.5%
Memory size1.3 MiB
Non-smoker
62 
Smoker
24 
Non smoker
 
2

Length

Max length10
Median length10
Mean length8.9090909
Min length6

Characters and Unicode

Total characters784
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNon-smoker
2nd rowNon-smoker
3rd rowNon-smoker
4th rowSmoker
5th rowNon-smoker

Common Values

ValueCountFrequency (%)
Non-smoker 62
 
0.3%
Smoker 24
 
0.1%
Non smoker 2
 
< 0.1%
(Missing) 19284
99.5%

Length

2025-07-15T00:50:49.693309image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T00:50:49.754623image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
non-smoker 62
68.9%
smoker 26
28.9%
non 2
 
2.2%

Most occurring characters

ValueCountFrequency (%)
o 152
19.4%
m 88
11.2%
k 88
11.2%
r 88
11.2%
e 88
11.2%
n 64
8.2%
N 64
8.2%
s 64
8.2%
- 62
7.9%
S 24
 
3.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 784
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
o 152
19.4%
m 88
11.2%
k 88
11.2%
r 88
11.2%
e 88
11.2%
n 64
8.2%
N 64
8.2%
s 64
8.2%
- 62
7.9%
S 24
 
3.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 784
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
o 152
19.4%
m 88
11.2%
k 88
11.2%
r 88
11.2%
e 88
11.2%
n 64
8.2%
N 64
8.2%
s 64
8.2%
- 62
7.9%
S 24
 
3.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 784
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
o 152
19.4%
m 88
11.2%
k 88
11.2%
r 88
11.2%
e 88
11.2%
n 64
8.2%
N 64
8.2%
s 64
8.2%
- 62
7.9%
S 24
 
3.1%

GERMLINE_MUTATION
Categorical

High correlation  Missing 

Distinct5
Distinct (%)31.2%
Missing19356
Missing (%)99.9%
Memory size1.3 MiB
SDHA
10 
SDHD
SDHD (c.448_449insATCT/p.C150Yfs*42)
SDHA (Unknown which curated mutation is germline and which is somatic. )
 
1
SDHB
 
1

Length

Max length72
Median length4
Mean length12.25
Min length4

Characters and Unicode

Total characters196
Distinct characters42
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)12.5%

Sample

1st rowSDHA
2nd rowSDHA
3rd rowSDHA
4th rowSDHA
5th rowSDHA

Common Values

ValueCountFrequency (%)
SDHA 10
 
0.1%
SDHD 2
 
< 0.1%
SDHD (c.448_449insATCT/p.C150Yfs*42) 2
 
< 0.1%
SDHA (Unknown which curated mutation is germline and which is somatic. ) 1
 
< 0.1%
SDHB 1
 
< 0.1%
(Missing) 19356
99.9%

Length

2025-07-15T00:50:49.826034image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T00:50:49.887471image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
sdha 11
37.9%
sdhd 4
 
13.8%
c.448_449insatct/p.c150yfs*42 2
 
6.9%
which 2
 
6.9%
is 2
 
6.9%
unknown 1
 
3.4%
curated 1
 
3.4%
mutation 1
 
3.4%
germline 1
 
3.4%
and 1
 
3.4%
Other values (3) 3
 
10.3%

Most occurring characters

ValueCountFrequency (%)
D 20
 
10.2%
S 16
 
8.2%
H 16
 
8.2%
A 13
 
6.6%
13
 
6.6%
4 10
 
5.1%
i 9
 
4.6%
n 8
 
4.1%
s 7
 
3.6%
c 6
 
3.1%
Other values (32) 78
39.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 196
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
D 20
 
10.2%
S 16
 
8.2%
H 16
 
8.2%
A 13
 
6.6%
13
 
6.6%
4 10
 
5.1%
i 9
 
4.6%
n 8
 
4.1%
s 7
 
3.6%
c 6
 
3.1%
Other values (32) 78
39.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 196
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
D 20
 
10.2%
S 16
 
8.2%
H 16
 
8.2%
A 13
 
6.6%
13
 
6.6%
4 10
 
5.1%
i 9
 
4.6%
n 8
 
4.1%
s 7
 
3.6%
c 6
 
3.1%
Other values (32) 78
39.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 196
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
D 20
 
10.2%
S 16
 
8.2%
H 16
 
8.2%
A 13
 
6.6%
13
 
6.6%
4 10
 
5.1%
i 9
 
4.6%
n 8
 
4.1%
s 7
 
3.6%
c 6
 
3.1%
Other values (32) 78
39.8%

THERAPY
Categorical

High correlation  Imbalance  Missing 

Distinct10
Distinct (%)0.8%
Missing18066
Missing (%)93.3%
Memory size1.4 MiB
No prior imatinib therapy
1100 
No prior therapy
155 
Possible Imatinib intolerant individual rather than primary or secondary non-response
 
18
Prior imatinib therapy
 
17
GIST treated with imatinib prior to desmoid tumour development
 
11
Other values (5)
 
5

Length

Max length96
Median length25
Mean length25.24732
Min length16

Characters and Unicode

Total characters32973
Distinct characters37
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)0.4%

Sample

1st rowNo prior imatinib therapy
2nd rowNo prior imatinib therapy
3rd rowNo prior imatinib therapy
4th rowNo prior imatinib therapy
5th rowNo prior imatinib therapy

Common Values

ValueCountFrequency (%)
No prior imatinib therapy 1100
 
5.7%
No prior therapy 155
 
0.8%
Possible Imatinib intolerant individual rather than primary or secondary non-response 18
 
0.1%
Prior imatinib therapy 17
 
0.1%
GIST treated with imatinib prior to desmoid tumour development 11
 
0.1%
Imatinib and erlotinib therapy alternated 2:4 weeks respectively 1
 
< 0.1%
Individual has been treated with sunitinib after progression on imatinib 1
 
< 0.1%
Prior imatinib therapy (intolerent of high dose imatinib therapy or progressive disaese) 1
 
< 0.1%
Individual was treated with surgery and neoadjuvant imatinib for gastrointestinal stromal tumour 1
 
< 0.1%
Individual has been treated with radiotherapy for endometrial carcinoma 7 years earlier 1
 
< 0.1%
(Missing) 18066
93.3%

Length

2025-07-15T00:50:49.980902image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T00:50:50.067059image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
prior 1284
24.5%
therapy 1275
24.3%
no 1255
23.9%
imatinib 1151
21.9%
individual 21
 
0.4%
or 19
 
0.4%
intolerant 18
 
0.3%
rather 18
 
0.3%
than 18
 
0.3%
possible 18
 
0.3%
Other values (40) 172
 
3.3%

Most occurring characters

ValueCountFrequency (%)
i 4874
14.8%
r 4036
12.2%
3943
12.0%
o 2709
8.2%
p 2592
7.9%
t 2589
7.9%
a 2572
7.8%
e 1485
 
4.5%
h 1330
 
4.0%
n 1327
 
4.0%
Other values (27) 5516
16.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 32973
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
i 4874
14.8%
r 4036
12.2%
3943
12.0%
o 2709
8.2%
p 2592
7.9%
t 2589
7.9%
a 2572
7.8%
e 1485
 
4.5%
h 1330
 
4.0%
n 1327
 
4.0%
Other values (27) 5516
16.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 32973
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
i 4874
14.8%
r 4036
12.2%
3943
12.0%
o 2709
8.2%
p 2592
7.9%
t 2589
7.9%
a 2572
7.8%
e 1485
 
4.5%
h 1330
 
4.0%
n 1327
 
4.0%
Other values (27) 5516
16.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 32973
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
i 4874
14.8%
r 4036
12.2%
3943
12.0%
o 2709
8.2%
p 2592
7.9%
t 2589
7.9%
a 2572
7.8%
e 1485
 
4.5%
h 1330
 
4.0%
n 1327
 
4.0%
Other values (27) 5516
16.7%

FAMILY
Categorical

High correlation  Imbalance  Missing 

Distinct10
Distinct (%)7.6%
Missing19240
Missing (%)99.3%
Memory size1.3 MiB
Sample from an individual with type 1 neurofibromatosis
113 
No family history of gastrointestinal stromal tumour or paraganglioma
 
9
No family history of cancer
 
3
Colon cancer (father)
 
1
Brother of sample 1117717
 
1
Other values (5)
 
5

Length

Max length123
Median length55
Mean length54.340909
Min length17

Characters and Unicode

Total characters7173
Distinct characters44
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)5.3%

Sample

1st rowNo family history of gastrointestinal stromal tumour or paraganglioma
2nd rowSample from an individual with type 1 neurofibromatosis
3rd rowNo family history of cancer
4th rowColon cancer (father)
5th rowSample from an individual with type 1 neurofibromatosis

Common Values

ValueCountFrequency (%)
Sample from an individual with type 1 neurofibromatosis 113
 
0.6%
No family history of gastrointestinal stromal tumour or paraganglioma 9
 
< 0.1%
No family history of cancer 3
 
< 0.1%
Colon cancer (father) 1
 
< 0.1%
Brother of sample 1117717 1
 
< 0.1%
Son of sample 1029463 1
 
< 0.1%
Individual from a family with no known family history of gastrointestinal stromal tumour, paraganglioma or pheochromocytoma 1
 
< 0.1%
Brother of sample 1117718 1
 
< 0.1%
Father of sample 1029464 1
 
< 0.1%
Lung cancer (FDR) 1
 
< 0.1%
(Missing) 19240
99.3%

Length

2025-07-15T00:50:50.449604image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T00:50:50.529766image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
sample 117
11.3%
from 114
11.0%
individual 114
11.0%
with 114
11.0%
an 113
10.9%
type 113
10.9%
1 113
10.9%
neurofibromatosis 113
10.9%
of 17
 
1.6%
family 14
 
1.3%
Other values (21) 96
9.2%

Most occurring characters

ValueCountFrequency (%)
906
12.6%
i 738
 
10.3%
o 556
 
7.8%
a 550
 
7.7%
r 415
 
5.8%
t 408
 
5.7%
m 390
 
5.4%
n 381
 
5.3%
e 363
 
5.1%
l 276
 
3.8%
Other values (34) 2190
30.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 7173
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
906
12.6%
i 738
 
10.3%
o 556
 
7.8%
a 550
 
7.7%
r 415
 
5.8%
t 408
 
5.7%
m 390
 
5.4%
n 381
 
5.3%
e 363
 
5.1%
l 276
 
3.8%
Other values (34) 2190
30.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 7173
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
906
12.6%
i 738
 
10.3%
o 556
 
7.8%
a 550
 
7.7%
r 415
 
5.8%
t 408
 
5.7%
m 390
 
5.4%
n 381
 
5.3%
e 363
 
5.1%
l 276
 
3.8%
Other values (34) 2190
30.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 7173
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
906
12.6%
i 738
 
10.3%
o 556
 
7.8%
a 550
 
7.7%
r 415
 
5.8%
t 408
 
5.7%
m 390
 
5.4%
n 381
 
5.3%
e 363
 
5.1%
l 276
 
3.8%
Other values (34) 2190
30.5%

INDIVIDUAL_REMARK
Categorical

High correlation  Missing 

Distinct36
Distinct (%)15.2%
Missing19135
Missing (%)98.8%
Memory size1.3 MiB
Age=Adult 19-83 years
58 
Age=Adult
39 
Individual has Carney triad
34 
Individual has NF1
21 
No history of FAP
13 
Other values (31)
72 

Length

Max length91
Median length75
Mean length22.729958
Min length6

Characters and Unicode

Total characters5387
Distinct characters52
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique21 ?
Unique (%)8.9%

Sample

1st rowIndividual was under 21
2nd rowAge=Adult 19-83 years
3rd rowAge=Adult 19-83 years
4th rowAge=Adult 19-83 years
5th rowAge=Middle-aged

Common Values

ValueCountFrequency (%)
Age=Adult 19-83 years 58
 
0.3%
Age=Adult 39
 
0.2%
Individual has Carney triad 34
 
0.2%
Individual has NF1 21
 
0.1%
No history of FAP 13
 
0.1%
Individual was under 21 13
 
0.1%
No family history of NF1 10
 
0.1%
Sporadic NF1 7
 
< 0.1%
Individual has neurofibromatosis type 1 6
 
< 0.1%
Remark 4
 
< 0.1%
Other values (26) 32
 
0.2%
(Missing) 19135
98.8%

Length

2025-07-15T00:50:50.664760image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
age=adult 97
 
12.1%
individual 92
 
11.5%
has 77
 
9.6%
years 59
 
7.4%
19-83 58
 
7.2%
carney 40
 
5.0%
triad 40
 
5.0%
nf1 39
 
4.9%
of 26
 
3.2%
no 24
 
3.0%
Other values (83) 249
31.1%

Most occurring characters

ValueCountFrequency (%)
564
 
10.5%
a 440
 
8.2%
d 389
 
7.2%
i 352
 
6.5%
e 319
 
5.9%
r 269
 
5.0%
l 249
 
4.6%
t 231
 
4.3%
s 224
 
4.2%
n 224
 
4.2%
Other values (42) 2126
39.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 5387
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
564
 
10.5%
a 440
 
8.2%
d 389
 
7.2%
i 352
 
6.5%
e 319
 
5.9%
r 269
 
5.0%
l 249
 
4.6%
t 231
 
4.3%
s 224
 
4.2%
n 224
 
4.2%
Other values (42) 2126
39.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 5387
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
564
 
10.5%
a 440
 
8.2%
d 389
 
7.2%
i 352
 
6.5%
e 319
 
5.9%
r 269
 
5.0%
l 249
 
4.6%
t 231
 
4.3%
s 224
 
4.2%
n 224
 
4.2%
Other values (42) 2126
39.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 5387
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
564
 
10.5%
a 440
 
8.2%
d 389
 
7.2%
i 352
 
6.5%
e 319
 
5.9%
r 269
 
5.0%
l 249
 
4.6%
t 231
 
4.3%
s 224
 
4.2%
n 224
 
4.2%
Other values (42) 2126
39.5%

Interactions

2025-07-15T00:50:42.471523image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-15T00:50:41.978913image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-15T00:50:42.220742image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-15T00:50:42.540404image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-15T00:50:42.059221image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-15T00:50:42.301403image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-15T00:50:42.625291image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-15T00:50:42.148062image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-15T00:50:42.389503image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Correlations

2025-07-15T00:50:50.754594image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
AGECYTOGENETICSENVIRONMENTAL_VARIABLESETHNICITYFAMILYGENDERGERMLINE_MUTATIONGRADEINDIVIDUAL_IDINDIVIDUAL_REMARKMETASTATIC_SITEMUTATION_ALLELE_SPECIFICATIONNORMAL_TISSUE_TESTEDRNASEQ_SCREENSAMPLE_TYPESTAGETARGETED_SCREENTHERAPYTUMOUR_IDTUMOUR_REMARKTUMOUR_SOURCEWHOLE_EXOME_SCREEN
AGE1.0000.1550.0000.2250.3110.1040.0000.000-0.0240.4470.1060.1580.1440.0000.1050.1750.0340.095-0.0340.4840.1150.051
CYTOGENETICS0.1551.0000.0000.0000.0000.7040.0000.0000.9870.0000.5190.0000.0001.0000.6890.0000.4450.0000.9870.0000.3450.445
ENVIRONMENTAL_VARIABLES0.0000.0001.0000.0000.0000.1710.0000.0000.7210.0000.2050.0000.0571.0000.185NaN1.0000.0000.7210.0000.7011.000
ETHNICITY0.2250.0000.0001.000NaN0.4480.0000.0000.7160.3470.6671.0000.9621.0000.3730.0001.0000.9630.6920.0000.5811.000
FAMILY0.3110.0000.000NaN1.0000.1921.0001.0000.7950.0001.0000.0000.0001.0000.6700.0001.0000.0000.7881.0000.1981.000
GENDER0.1040.7040.1710.4480.1921.0000.0000.0000.2170.7630.2130.4800.0140.0380.1510.6020.1190.4960.2170.8080.2850.113
GERMLINE_MUTATION0.0000.0000.0000.0001.0000.0001.0000.0000.732NaNNaN0.0000.8861.0000.4820.0001.0000.0000.7320.0000.5191.000
GRADE0.0000.0000.0000.0001.0000.0000.0001.0000.6921.0000.0000.0001.0001.0000.7021.0001.0000.0000.6920.0000.1561.000
INDIVIDUAL_ID-0.0240.9870.7210.7160.7950.2170.7320.6921.0000.8500.3990.8370.6380.0370.2830.5640.1470.5820.9960.9760.3330.157
INDIVIDUAL_REMARK0.4470.0000.0000.3470.0000.763NaN1.0000.8501.0000.0001.0001.0001.0000.898NaN1.0000.9910.8490.4560.7791.000
METASTATIC_SITE0.1060.5190.2050.6671.0000.213NaN0.0000.3990.0001.0001.0000.2191.0000.2950.0000.3570.4880.3881.0000.5250.357
MUTATION_ALLELE_SPECIFICATION0.1580.0000.0001.0000.0000.4800.0000.0000.8371.0001.0001.0000.0001.0000.9120.0001.0000.0000.9750.0000.8531.000
NORMAL_TISSUE_TESTED0.1440.0000.0570.9620.0000.0140.8861.0000.6381.0000.2190.0001.0000.0270.2410.8160.1331.0000.6380.0000.4640.125
RNASEQ_SCREEN0.0001.0001.0001.0001.0000.0381.0001.0000.0371.0001.0001.0000.0271.0000.0601.0000.3091.0000.0371.0000.0000.000
SAMPLE_TYPE0.1050.6890.1850.3730.6700.1510.4820.7020.2830.8980.2950.9120.2410.0601.0000.1490.0420.2570.2830.9170.0970.034
STAGE0.1750.000NaN0.0000.0000.6020.0001.0000.564NaN0.0000.0000.8161.0000.1491.0001.0000.0000.5640.0000.5291.000
TARGETED_SCREEN0.0340.4451.0001.0001.0000.1191.0001.0000.1471.0000.3571.0000.1330.3090.0421.0001.0001.0000.1371.0000.1010.934
THERAPY0.0950.0000.0000.9630.0000.4960.0000.0000.5820.9910.4880.0001.0001.0000.2570.0001.0001.0000.5820.9220.6091.000
TUMOUR_ID-0.0340.9870.7210.6920.7880.2170.7320.6920.9960.8490.3880.9750.6380.0370.2830.5640.1370.5821.0000.9740.3300.148
TUMOUR_REMARK0.4840.0000.0000.0001.0000.8080.0000.0000.9760.4561.0000.0000.0001.0000.9170.0001.0000.9220.9741.0000.9771.000
TUMOUR_SOURCE0.1150.3450.7010.5810.1980.2850.5190.1560.3330.7790.5250.8530.4640.0000.0970.5290.1010.6090.3300.9771.0000.108
WHOLE_EXOME_SCREEN0.0510.4451.0001.0001.0000.1131.0001.0000.1571.0000.3571.0000.1250.0000.0341.0000.9341.0000.1481.0000.1081.000

Missing values

2025-07-15T00:50:42.807144image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.
2025-07-15T00:50:43.241187image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2025-07-15T00:50:43.602253image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

COSMIC_SAMPLE_IDSAMPLE_NAMECOSMIC_PHENOTYPE_IDTUMOUR_IDSAMPLE_TYPEINDIVIDUAL_IDWHOLE_GENOME_SCREENWHOLE_EXOME_SCREENTARGETED_SCREENRNASEQ_SCREENREARRANGEMENT_SCREENTUMOUR_SOURCENORMAL_TISSUE_TESTEDGENDERAGETHERAPY_RELATIONSHIPSAMPLE_DIFFERENTIATORMUTATION_ALLELE_SPECIFICATIONMSIAVERAGE_PLOIDYSAMPLE_REMARKDRUG_RESPONSEGRADEAGE_AT_TUMOUR_RECURRENCESTAGECYTOGENETICSMETASTATIC_SITETUMOUR_REMARKETHNICITYENVIRONMENTAL_VARIABLESGERMLINE_MUTATIONTHERAPYFAMILYINDIVIDUAL_REMARK
11COSS14969351496935COSO313553811419814surgery-fixed1368880nnynnprimaryNaNuNaNNaNNaNNaNUnknownNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
64COSS10364271036427COSO28815381953184surgery-fixed930275nnynnNSNaNuNaNNaNNaNNaNUnknownNaNNaNNaNNaNNaNNaNNaNNaNTumour preselected as being negative for mutations in KIT e9, e11, e13, e17 and PDGFRA e12 and e18. (Tumour preselected as being negative for mutations in KIT e9, e11, e13, e17 and PDGFRA e12 and e18.)NaNNaNNaNNaNNaNNaN
120COSS14969851496985COSO313553811419864surgery-fixed1368930nnynnprimaryNaNuNaNNaNNaNNaNUnknownNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
306COSS24680282468028COSO360753812330859surgery-fixed2180560nnynnprimaryNaNm72.0NaNNaNNaNUnknownNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
387COSS820950E17588COSO28775381740710surgery-fixed723216nnynnprimarynuNaNNaNNaNNaNUnknownNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
434COSS17319411731941COSO313553811637907surgery-fixed1573598nnynnNSNaNuNaNNaNNaNNaNUnknownNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNo prior imatinib therapyNaNNaN
528COSS1384396RYU07-20COSO360753811294437surgery - NOS1261383nnynnprimaryNaNuNaNNaNNaNNaNUnknownNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
566COSS15448351544835COSO366053811466995surgery-fixed1413601nnynnNSNaNuNaNNaNNaNNaNUnknownNaNNaNImatinib clinical primary non response (local progression)NaNNaNNaNNaNNaNNaNTaiwaneseNaNNaNNaNNaNNaN
719COSS26111182611118COSO313553812471936surgery-fixed2321371nnynnNSNaNuNaNNaNNaNNaNUnknownNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
892COSS12842351284235COSO288157631195530surgery fresh/frozen1167235nnynnNSNaNf68.0NaNNaNNaNUnknownNaNNaNNaNSome Grade data are given in publicationNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
COSMIC_SAMPLE_IDSAMPLE_NAMECOSMIC_PHENOTYPE_IDTUMOUR_IDSAMPLE_TYPEINDIVIDUAL_IDWHOLE_GENOME_SCREENWHOLE_EXOME_SCREENTARGETED_SCREENRNASEQ_SCREENREARRANGEMENT_SCREENTUMOUR_SOURCENORMAL_TISSUE_TESTEDGENDERAGETHERAPY_RELATIONSHIPSAMPLE_DIFFERENTIATORMUTATION_ALLELE_SPECIFICATIONMSIAVERAGE_PLOIDYSAMPLE_REMARKDRUG_RESPONSEGRADEAGE_AT_TUMOUR_RECURRENCESTAGECYTOGENETICSMETASTATIC_SITETUMOUR_REMARKETHNICITYENVIRONMENTAL_VARIABLESGERMLINE_MUTATIONTHERAPYFAMILYINDIVIDUAL_REMARK
1557588COSS24321232432123COSO313553812295026surgery-fixed2146615nnynnNSNaNuNaNSample analysed before imatinib therapyNaNNaNUnknownNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
1557594COSS24820762482076COSO313553812344806surgery fresh/frozen2194295nnynnNSNaNuNaNNaNNaNNaNUnknownNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
1557680COSS19291671929167COSO366053811816457surgery - NOS1729772nnynnNSNaNuNaNNaNNaNNaNUnknownNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
1557745COSS25031542503154COSO287753812365504surgery - NOS2216705nnynnprimaryNaNuNaNSample analysed before imatinib therapyNaNNaNUnknownNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNo prior imatinib therapyNaNNaN
1557748COSS24315932431593COSO313553812294496surgery-fixed2146085nnynnNSNaNuNaNSample analysed before imatinib therapyNaNNaNUnknownNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
1557901COSS29064372906437COSO366053812760689surgery-fixed2596118nnynnprimaryNaNuNaNSample taken before imatinib therapyNaNNaNUnknownNaNNaNImatinib clinical response - not further specifiedNaNNaNNaNNaNNaNNaNChineseNaNNaNNaNNaNNaN
1558040COSS24022602402260COSO366053812265118surgery-fixed2115894nnynnNSNaNuNaNNaNNaNNaNUnknownNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
1558051COSS17319261731926COSO313553811637892surgery-fixed1573583nnynnNSNaNuNaNNaNNaNNaNUnknownNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNo prior imatinib therapyNaNNaN
1558255COSS14796741479674COSO288153851403334surgery-fixed1353085nnynnprimaryNaNuNaNNaNNaNNaNUnknownNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
1558260COSS11824211182421COSO287753811094392surgery-fixed1067674nnynnNSNaNm72.0NaNNaNNaNUnknownNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNUrinary bladder carcinoma since 32 months